GeneWebEx: Gene Annotation Web Extraction, Aggregation, and Updating from Web-Based Biomolecular Databanks

نویسندگان

  • Marco Masseroli
  • Andrea Stella
  • Natalia Meani
  • Myriam Alcalay
  • Francesco Pinciroli
چکیده

Numerous genomic annotations are currently stored in different web-accessible databanks that scientists need to mine with user-defined queries and in a batch mode to orderly integrate the diverse mined data in suitable user-customizable working environments. Unfortunately, to date, most accessible databanks can be interrogated only for a single gene or protein at a time and generally the data retrieved are available in HTML page format only. We developed GeneWebEx to effectively mine data of interest in different HTML pages of web-based databanks, and organize extracted data for further analyses. GeneWebEx utilizes userdefined templates to identify data to extract, and aggregates and structures them in a database designed to allocate the various extractions from distinct biomolecular databanks. Moreover, a template-based module enables automatic updating of extracted data. Validations performed on GeneWebEx allowed us to efficiently gather relevant annotations from various sources, and comprehensively query them to highlight significant biological characteristics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genewebex: Gene Annotation Web Extraction, Aggregation, and Updating From Web-Interfaced Biomolecular Databanks

MARCO MASSEROLI∗,† ANDREA STELLA†, MYRIAM ALCALAY‡,§ and FRANCESCO PINCIROLI† †Bioengineering Department, Politecnico di Milano, Piazza Leonardo da Vinci, 32, I-20133 Milano, Italy ‡IEO — European Institute of Oncology, I-20141 Milano, Italy §IFOM — FIRC Institute of Molecular Oncology, I-20139 Milano, Italy ∗[email protected] http://www.biomed.polimi.it/BioIntro/english/people/researc...

متن کامل

MyWEST: My Web Extraction Software Tool for effective mining of annotations from web-based databanks

MOTIVATION High-throughput technologies create the necessity to mine large amounts of gene annotations from diverse databanks, and to integrate the resulting data. Most databanks can be interrogated only via Web, for a single gene at a time, and query results are generally available only in the HTML format. Although some databanks provide batch retrieval of data via FTP, this requires expertise...

متن کامل

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

Genomic Functional Investigation through Statistical Analysis of Protein Families and Domains

Protein families and domains represent a very relevant resource useful to understand protein functions and interactions among their codifying genes. To perform evaluations of gene annotations sparsely available in numerous different databanks accessible via Internet, we previously developed GFINDer, a Web server that performs statistical analysis of functional and phenotypic annotations of gene...

متن کامل

A Framework for Resource Annotation and Classification in Bioinformatics

Semantic annotation is commonly recognized as one of the cornerstones of the semantic Web. In the context of Web services, semantic annotations can support effective and efficient discovery of services, and guide their composition into workflows. Because semantic annotation is a time consuming and expensive task, (semi-)automatic approaches for semantic annotation extraction are required. In th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004